15 research outputs found
Identifying health status of wind turbines by using self organizing maps and interpretation-oriented post-processing tools
Identifying the health status of wind turbines becomes critical to reduce the impact of failures on generation costs (between 25–35%). This is a time-consuming task since a human expert has to explore turbines individually. Methods: To optimize this process, we present a strategy based on Self Organizing Maps, clustering and a further grouping of turbines based on the centroids of their SOM clusters, generating groups of turbines that have similar behavior for subsystem failure. The human expert can diagnose the wind farm health by the analysis of a small each group sample. By introducing post-processing tools like Class panel graphs and Traffic lights panels, the conceptualization of the clusters is enhanced, providing additional information of what kind of real scenarios the clusters point out contributing to a better diagnosis. Results: The proposed approach has been tested in real wind farms with different characteristics (number of wind turbines, manufacturers, power, type of sensors, ...) and compared with classical clustering. Conclusions: Experimental results show that the states healthy, unhealthy and intermediate have been detected. Besides, the operational modes identified for each wind turbine overcome those obtained with classical clustering techniques capturing the intrinsic stationarity of the data.Peer ReviewedPostprint (published version
Learning from Incomplete Features by Simultaneous Training of Neural Networks and Sparse Coding
In this paper, the problem of training a classifier on a dataset with incomplete features is addressed. We assume that different subsets of features (random or structured) are available at each data instance. This situation typically occurs in the applications when not all the features are collected for every data sample. A new supervised learning method is developed to train a general classifier, such as a logistic regression or a deep neural network, using only a subset of features per sample, while assuming sparse representations of data vectors on an unknown dictionary. Sufficient conditions are identified, such that, if it is possible to train a classifier on incomplete observations so that their reconstructions are well separated by a hyperplane, then the same classifier also correctly separates the original (unobserved) data samples. Extensive simulation results on synthetic and well-known datasets are presented that validate our theoretical findings and demonstrate the effectiveness of the proposed method compared to traditional data imputation approaches and one state-of-the-art algorithm.Fil: Caiafa, César Federico. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones CientÃficas. Instituto Argentino de RadioastronomÃa. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - La Plata. Instituto Argentino de RadioastronomÃa; ArgentinaFil: Wang, Ziyao. South East University; ChinaFil: Sole Casals, Jordi. University of Vic; EspañaFil: Zhao, Qibin. Center for Advanced Intelligence Project; JapónIEEE Computer Society Conference on Computer Vision and Pattern Recognition 2021New YorkEstados UnidosIEE
Gene filtering with optimal threshold selection
Gene filtering is a useful preprocessing technique often applied to microarray datasets. However, it is no common practice because clear guidelines are lacking and it bears the risk of excluding some potentially relevant genes. In this work, we propose to model microarray data as a mixture of two Gaussian distributions that will allow us to obtain an optimal filter threshold in terms of the gene expression level.Fil: Bau Macia, Josep. Universidad de Vic; EspañaFil: Sole Casals, Jordi. Universidad de Vic; EspañaFil: Caiafa, César Federico. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones CientÃficas. Instituto Argentino de RadioastronomÃa. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - La Plata. Instituto Argentino de RadioastronomÃa; ArgentinaFil: Lew, Sergio Eduardo. Universidad de Buenos Aires. Facultad de IngenierÃa. Departamento de Electronica; ArgentinaThe Barcelona International Conference on Advances in StatisticsBarcelonaEspañaUniversidad Autónoma de Barcelon
Decomposition methods for machine learning with small, incomplete or noisy datasets
In many machine learning applications, measurements are sometimes incomplete or noisy resulting in missing features. In other cases, and for different reasons, the datasets are originally small, and therefore, more data samples are required to derive useful supervised or unsupervised classification methods. Correct handling of incomplete, noisy or small datasets in machine learning is a fundamental and classic challenge. In this article, we provide a unified review of recently proposed methods based on signal decomposition for missing features imputation (data completion), classification of noisy samples and artificial generation of new data samples (data augmentation). We illustrate the application of these signal decomposition methods in diverse selected practical machine learning examples including: brain computer interface, epileptic intracranial electroencephalogram signals classification, face recognition/verification and water networks data analysis. We show that a signal decomposition approach can provide valuable tools to improve machine learning performance with low quality datasets.Fil: Caiafa, César Federico. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones CientÃficas. Instituto Argentino de RadioastronomÃa. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - La Plata. Instituto Argentino de RadioastronomÃa; ArgentinaFil: Sole Casals, Jordi. Center for Advanced Intelligence; JapónFil: Marti Puig, Pere. University of Catalonia; EspañaFil: Sun, Zhe. RIKEN; JapónFil: Tanaka,Toshihisa. Tokyo University of Agriculture and Technology; Japó
Detection of Wind Turbine Failures through Cross-Information between Neighbouring Turbines
In this paper, the time variation of signals from several SCADA systems of geographically closed turbines are analysed and compared. When operating correctly, they show a clear pattern of joint variation. However, the presence of a failure in one of the turbines causes the signals from the faulty turbine to decouple from the pattern. From this information, SCADA data is used to determine, firstly, how to derive reference signals describing this pattern and, secondly, to compare the evolution of different turbines with respect to this joint variation. This makes it possible to determine whether the behaviour of the assembly is correct, because they maintain the well-functioning patterns, or whether they are decoupled. The presented strategy is very effective and can provide important support for decision making in turbine maintenance and, in the near future, to improve the classification of signals for training supervised normality models. In addition to being a very effective system, it is a low computational cost strategy, which can add great value to the SCADA data systems present in wind farms.Peer ReviewedObjectius de Desenvolupament Sostenible::7 - Energia Assequible i No Contaminant::7.a - Per a 2030, augmentar la cooperació internacional per tal de facilitar l’accés a la investigació i a les tecnoloÂgies energètiques no contaminants, incloses les fonts d’energia renovables, l’eficiència energètica i les tecnologies de combustibles fòssils avançades i menys contaminants, i promoure la inversió en infraestructures energètiques i tecnologies d’energia no contaminantObjectius de Desenvolupament Sostenible::7 - Energia Assequible i No Contaminant::7.b - Per a 2030, ampliar la infraestructura i millorar la tecnologia per tal d’oferir serveis d’energia moderns i sosÂtenibles per a tots els països en desenvolupament, en particular els països menys avançats, els petits estats insulars en desenvolupament i els països en desenvolupament sense litoral, d’acord amb els programes de suport respectiusObjectius de Desenvolupament Sostenible::7 - Energia Assequible i No ContaminantPostprint (published version
Serial-EMD: Fast Empirical Mode Decomposition Method for Multi-dimensional Signals Based on Serialization
Empirical mode decomposition (EMD) has developed into a prominent tool for adaptive, scale-based signal analysis in various fields like robotics, security and biomedical engineering. Since the dramatic increase in amount of data puts forward higher requirements for the capability of real-time signal analysis, it is difficult for existing EMD and its variants to trade off the growth of data dimension and the speed of signal analysis. In order to decompose multi-dimensional signals at a faster speed, we present a novel signal-serialization method (serial-EMD), which concatenates multi-variate or multi-dimensional signals into a one-dimensional signal and uses various one-dimensional EMD algorithms to decompose it. To verify the effects of the proposed method, synthetic multi-variate time series, artificial 2D images with various textures and real-world facial images are tested. Compared with existing multi-EMD algorithms, the decomposition time becomes significantly reduced. In addition, the results of facial recognition with Intrinsic Mode Functions (IMFs) extracted using our method can achieve a higher accuracy than those obtained by existing multi-EMD algorithms, which demonstrates the superior performance of our method in terms of the quality of IMFs. Furthermore, this method can provide a new perspective to optimize the existing EMD algorithms, that is, transforming the structure of the input signal rather than being constrained by developing envelope computation techniques or signal decomposition methods. In summary, the study suggests that the serial-EMD technique is a highly competitive and fast alternative for multi-dimensional signal analysis.Fil: Zhang, Jin. Nankai University; ChinaFil: Feng, Fan. Nankai University; ChinaFil: Marti Puig, Pere. Central University of Catalonia; EspañaFil: Caiafa, César Federico. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones CientÃficas. Instituto Argentino de RadioastronomÃa. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - La Plata. Instituto Argentino de RadioastronomÃa; ArgentinaFil: Sun, Zhe. RIKEN; JapónFil: Duan, Feng. Nankai University; ChinaFil: Sole Casals, Jordi. Central University of Catalonia; Españ
Maximum likelihood Linear Programming Data Fusion for Speaker Recognition
Biometric system performance can be improved by means of data fusion. Several kinds of
information can be fused in order to obtain a more accurate classification (identification or
verification) of an input sample. In this paper we present a method for computing the
weights in a weighted sum fusion for score combinations, by means of a likelihood model.
The maximum likelihood estimation is set as a linear programming problem. The scores are
derived from a GMM classifier working on a different feature extractor. Our experimental
results assesed the robustness of the system in front a changes on time (different sessions)
and robustness in front a change of microphone. The improvements obtained were
significantly better (error bars of two standard deviations) than a uniform weighted sum or a
uniform weighted product or the best single classifier. The proposed method scales
computationaly with the number of scores to be fussioned as the simplex method for linear
programming
Recommended from our members
Seizure onset zone classification based on imbalanced iEEG with data augmentation.
Objective. Identifying the seizure onset zone (SOZ) in patients with focal epilepsy is the critical information required for surgery. However, collecting this information is challenging, time-consuming, and subjective. Some machine learning methods reduce the workload of clinical experts in intracranial electroencephalogram (iEEG) visual diagnosis but face significant challenges because interictal iEEG clinical data often suffer from a significant class imbalance. We aim to generate synthetic data for the minority class.Approach. To make the clinically imbalanced data suitable for machine learning, we introduce an EEG augmentation method (EEGAug). The EEGAug method randomly selects several samples from the minority class and transforms them into the frequency domain. Then, different frequency bands from different samples are used to compose new data. Finally, a synthetic sample is generated after converting the new data back to the time domain.Main results. The imbalanced clinical iEEG data can be balanced and applied to machine learning models using the method. A one-dimensional convolutional neural network model is used to classify the SOZ and non-SOZ data. We compare the EEGAug method with other data augmentation methods and another method of class-balanced focal loss function, which is also used for solving the data imbalance problem by adjusting the weights between the minority and majority classes. The results show that the EEGAug method performs best in most data.Significance. Data imbalance is a widespread clinical problem. The EEGAug method can flexibly generate synthetic data for the minority class, yielding synthetic and raw data with a high distribution similarity. By using the EEGAug method, clinical data can be used in machine learning models
Initialisation of Nonlinearities for PNL and Wiener systems Inversion
Abstract. This paper proposes a very fast method for blindly initializing a nonlinear mapping which transforms a sum of random variables. The method provides a surprisingly good approximation even when the basic assumption is not fully satisfied. The method can been used successfully for initializing nonlinearity in post-nonlinear mixtures or in Wiener system inversion, for improving algorithm speed and convergence.
Identifying health status of wind turbines by using self organizing maps and interpretation-oriented post-processing tools
Identifying the health status of wind turbines becomes critical to reduce the impact of failures on generation costs (between 25–35%). This is a time-consuming task since a human expert has to explore turbines individually. Methods: To optimize this process, we present a strategy based on Self Organizing Maps, clustering and a further grouping of turbines based on the centroids of their SOM clusters, generating groups of turbines that have similar behavior for subsystem failure. The human expert can diagnose the wind farm health by the analysis of a small each group sample. By introducing post-processing tools like Class panel graphs and Traffic lights panels, the conceptualization of the clusters is enhanced, providing additional information of what kind of real scenarios the clusters point out contributing to a better diagnosis. Results: The proposed approach has been tested in real wind farms with different characteristics (number of wind turbines, manufacturers, power, type of sensors, ...) and compared with classical clustering. Conclusions: Experimental results show that the states healthy, unhealthy and intermediate have been detected. Besides, the operational modes identified for each wind turbine overcome those obtained with classical clustering techniques capturing the intrinsic stationarity of the data.Peer Reviewe